AITopics | mean reward

Collaborating Authors

mean reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bandit Social Learning under Myopic Behavior

Neural Information Processing SystemsApr-25-2026, 19:58:43 GMT

We study social learning dynamics motivated by reviews on online platforms. The agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals for the arms' expected rewards. We derive stark exploration failures for any such behavior, and provide matching positive results. As a special case, we obtain the first general results on failure of the greedy algorithm in bandits, thus providing a theoretical foundation for why bandit algorithms should explore.1

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe > United Kingdom > England (0.28)

Genre: Instructional Material (0.46)

Industry: Education > Curriculum (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.88)

Add feedback

ACloser Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Neural Information Processing SystemsApr-25-2026, 18:05:27 GMT

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB) policy is among the simplest optimism-based MAB algorithms that naturally adapts to this gap: for a horizon of play n, it achieves optimal O(log n) regret in instances with "large" gaps, and a near-optimal O nlog n minimax regret when the gap can be arbitrarily "small." This paper provides new results on the arm-sampling behavior of UCB, leading to several important insights. Among these, it is shown that arm-sampling rates under UCB are asymptotically deterministic, regardless of the problem complexity.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.91)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ACloser Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Neural Information Processing SystemsApr-25-2026, 18:05:24 GMT

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB) policy is among the simplest optimism-based MAB algorithms that naturally adapts to this gap: for a horizon of play n, it achieves optimal O(logn) regret in instances with "large" gaps, and a near-optimal O p nlogn minimax regret when the gap can be arbitrarily "small." This paper provides new results on the arm-sampling behavior of UCB, leading to several important insights. Among these, it is shown that arm-sampling rates under UCB are asymptotically deterministic, regardless of the problem complexity.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.95)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Finite-Time Logarithmic Bayes Regret Upper Bounds

Neural Information Processing SystemsApr-24-2026, 19:33:57 GMT

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain O(c logn)and O(ch log2 n)upper bounds for an upper confidence bound algorithm, where ch and c are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing O( n)bounds, which have become standard in the literature despite the logarithmic lower bound of Lai (1987).

bandit, data mining, machine learning, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Technology: